System for comparing information items to determine similarity therebetween



3,022,005 COMPARING INFORMATION ITEMS TO DETERMINE SIMILARITYTHEREBETWEEN 2 Sheets-Sheet 1 Feb- 20, 1962 w. E. DlcKlNsoN SYSTEM FORFiled Jan. 12, 1959 ATTORNEY Feb. 20, 1962 w. E. DlcKlNsoN 3,022,005

SYSTEM FR COMPARING INFORMATION ITEMS TO DETERMINE SIMILARITYTHEREBETWEEN Filed Jan. l2, 1959 2 Sheets-Sheet 2 QHM98765432IO UnitedStates Patent G SYSTEM FOP. CGMPAR'LNG INFRMATEN TEMS TO DETERMENESIMILARITY THEREBETWEEN Wesley E. Dickinson, County of Santa Ciara,Calif., as-

siguor to International Business Machines Corporation,

New York, NY., a corporation of New York Filed Jan. 12, l95t", Ser. No.786,254 7 Claims. (Cl. 23S-152) This invention relates in general tocomparison systems and in particular to a system for comparing anunknown quantity of information with a library of known quantities forselecting those quantities which have a predetermined degree ofsimilarity to the unknown quantity. ri'he invention has particularutility in processing information which may be represented eithergraphically or by an n-dimensional vector representation.

Many applications exist where it is desirable to compare one quantity,which is delined by a large number of parameters, with known quantitiesdetined by similar parameters. One such application exists in the heldof spectrum analysis wherein identification or" an element present in asample of an unknown compound may be accomplished by comparing thespectrum of the unknown with each of the known spectra since eachelement has a characteristic spectrum. Because of the .relatively largenumber of known elements, the library of their spectra is quite large.In addition, since the spectrum of each element is donned by arelatively large number of parameters it is quite impractical to make amanual comparison or" an unknown spectrum with each known spectrum.

Various attempts have been made in the past to mechanize this manualcomparison operation. in one known arrangement standard 55C-columnaccounting cards are employed for storing the parameters which denne thespectrum of the known elements, each column of a card corresponding to aparticular wavelength of the spectrum and the rows designating theamplitude of light ransmitted by the particular element at therespective wavelengths. In order to compare an unknown spectrum with thelibrary of known spectra, the cards are sorted by a conventionalaccounting card sorter column by column in accordance with the variousparameters of the unknown spectrum. In some instances, it is possible toeliminate sorting many columns; however, because of the quantity ofcards and the number of columns which must be considered, eachcomparison operation still requires considerable time.

lt will be seen that this type of sort to compare operation involves aone-to-one comparison which, in many applications, is not alwaysnecessary and `at times is not desired. For example, in spectrumanalysis it is more desirable to have `a comparison operation which isbased on the goodness of lit concept, as distinguishe from a direct orone-to-one correspondence of parameters. lt is of course possible toachieve the effect ot a goodness of fit sorting approach by conventionalsorting apparatus, but this increases the time required for eachcomparing operation as the degree of correspondence desired becomesless.

it has been found in accordance with the present invention that iacomparison system may be provided in which an unknown multi-parameterquantity may be compared with a library of similarly defined quantitiesto determine, in a single scan through the library, those quantitieswhich have a pre 'etermined selectable degree of correspondence to theunknown quantity.

The system of the invention comprises generally a library having aplurality of unit records corresponding to known quantities, each ofwhich may be defined by a plurality of parameters, each record havingmeans t'or llt/(notamentno (Wuz'l-Wrz'lWf-l* WAZ) I rice storing digitalrepresentations corresponding to each of the parameters. Means areprovided for converting digital representations of a parameter to acorresponding analog voltage level. Means are also provided forweighting each of these analog voltage levels in accordance with a valuepredetermined by the corresponding parameter of the unknown quantity,these weighted voltbeing summed and applied to one input terminal ot' adifferential amplifier.

The system further includes means for generating a first control voltagecorresponding to the square roo-t of the snm of the digitalrepresentations stored on the record and means tor weighting thiscontrol Voltage in accordance with the square root of the sum of thesquares of the predetermined weivhted values of the unknown quantity.The weighted control voltage is applied to the other terminal of thedifferential ampiier. The output voltage of the differential amplierwill be at a maximum when the input voltages are equal. Stated somewhatdifferently, the output voltage will be at a maximum when each of theparameters of the unknown quantity corresponds respectively to eachparameter of one of the records in the library. in addition, the outputvoltage will also be at a. maximum when the respective parameters havethe same proportion. A level sensing device is provided to determinewhen the output voltage of the ditte ential amplitier reaches apredetermined level so that an indicating signal may be generated. Inorder that unit records having parameters not precisely .the same as theunknown parameters may also provide an indicating signal in accordancewith some predetermined desired degree of correspondence, means arefurther provided ror effecting the weighted control voltage applied tothe differential ampiier.

Summarily, the system may be considered a mechanization of the equation:

where D is the amplitude of a particular parameter of a known spectrum,W is the amplitude of the corresponding parameters of the unknown, and Kis the degree of correspondence factor, D being represented by a voltageValue and W by a conductance value. R is the correlation coecientbetween the known s ectrum and the unknown spectrum and ranges between 0and l where l represents perfect correlation.

It is t eret'ore an object of the present invention to provide animproved system for comparing quantities having a plurality ofparameters.

Another object of the present invention is to provide an improvedcomparison system in which the degree of comparison desired may bevaried.

A further object of the present invention is to provide a comparisonsystem in which an unknown quantity which is defined by a pluraiity ofparameters may be compared with a library of known quantities dened bysimilar parameters to determine in a single scan operation the knownquantities which have a predetermined degree of correspondence to theunknown quantity.

Gtr'rer objects of the invention will be pointed out in the t'oil gdescription and claims and illustrated in the accompanyin r drawingswhich disclose, by Way of exempte, the pincipie of the invention and thebest mode which has :n contemplated of `applying that principle.

in the drawings:

l is a diagrammatic view of a comparison system embodying the presentinvention.

FIG. 2 is a graph illustrating the spectrum of an unknown element.

FIGS. 3 through 5 are schematic views of the Various components ofthesystem shown diagrammatically in FIG. 1.

Referring to the drawings and particularly to FIG. l, the comparisonsystem illustratedl therein is adapted to compare the spectrum of anunknown element against a library of known spectra to determine thoseelements which have a desired degree of correspondence to the unknownelement. It should benoted that while the system of the presentinventio-n is explained in terms of a spectrum analysis application,various other applications eXist wherein quantities to be compared maybe defined by a plurality of parameters. In general, quantities fsuitably represented in either graphical form or by an ndirnensionalvector representation may be compared by the present system.

FIG. 2 represents the spectrum of an unknown element in graphical form,the wavelength of the light in angstroms lbeing plotted along thehorizontal orrx axis and the amplitude of `the light transmitted therebybeing plotted along the vertical or y axis. Since each known element hasa particular characteristic spectrum, the unknown element may beidentified by comparing its spectrum with a library of spectra of knownelements. The library comprises a plurality of unit records, each ofwhich corresponds -to a known element. Such a library may contain 10,000or more unit records. The spectra of nearlyv all elements may be definedquite accurately by considering the response. of the element at arelatively large. number of wavelengths. In the present example thespectra are defined by 100 sample points whichrare referred to asdescriptors D Vand correspond to preselected wavelengths spacedsubstantially equally throughout the total spectrum.

As shown diagrammatically in FIG. l, the library in this instance isrecorded on :r6-channelY magnetic tape 11 which is movable in thedirection of arrow 12 past a reading station 13. The 100 descriptorsD0D99 which define the spectrum of the element are recorded serially onthe tape 11, each descriptor D Ibeing recorded as a binarycoded number.vFor purposes of explanation it -is assumed that each descriptor D canvary from to l5 units and hence, each descriptor is recorded in binarycode in parallel on the first four channels designated C1, C2, C4V andC2i of the magnetic tape. The 100 descriptors designatedv D0 through D99which comprises the record being compared are therefore scanned seriallyby the read station 13.v The tapeV unit 11 serves as one type of librarymean-s for magnetically storing each known quantity in terms of itsdescriptors in the form of binary codedr representations. However, Aitshould be obvious that other types of storage may be employed ifdesired.

The system further includes means 15 for converting each descriptor Dnto ya corresponding analog voltage Vpn. In the illustrated embodiment,this means cornprises a plurality of magnetic transducers T1, T2, T4 andTS- adapted to scan channels C1, C2, C4 and C8 of the tape 11, aplurality of 4-stage registers R0 through R99 and a plurality of gateunits GG throughGgg. The four input terminals ofthe registers B. areconnected to the transducers T1, T2, T4 and T8 in parallel throughrespective gating units G0 through Ggg. A counter or sequencer 19 isalso provided to operate the gate units G in` succession, The counter,as shown, operates in response to a clock signal generated by means oftransducer CL scanning a suitably recorded clock channel designated CCon tape 11 so that the digital representations of each descriptor D aretransferred to the appropriate register R in succession. Since anysuitable counter capable of counting from 0 to 100 may be ein-J ployedfor counter 19, it would not appear necessary to explain in detail itsstructure and operation. Reference may be had to many of the standardtexts on computers for fur-ther details.

The gate units G0' through G99 are Iall identical vand hence only` oneis shown in deail in FIG. 3. Each gate 4unit G comprises fourconventional AND gates A1 through A4. Each AND gate has -a pair of inputterminais 20a and 20h and an output terminal 20c. 'One terminal 29a ofeach AND gate is connected to a dilierent one of the transducers T1throughTS, the remaining four terminals Ztib of the AND gates beingconnected to the appropriate' output tap'21 of the counter 19 by meansof a line 22.

The output taps 20c of the AND gates A1 through A4 are connectedrespectively to the four input lines 23 of the associated register R.Each of the registers is similar, so only one is shown in detail inFIG.'4. Each register comprises four stages designated S1, S2, S4 andS8. Any suitable bi-stable device may be employed to function las onestage of the register, such as a conventional flip-flop circuit havingan input terminal 24, a reset terminal 25 and an output terminal 26. Itis assumedth-at a pulse supplied tothe input terminal 24 changes thelevel of the output voltage at terminal 26 from a' low state to a highstate 'and that a pulse applied to the reset terminal 25 causes thevoltage of the output terminal'26 to return to a low state. v

The output terminals 26 of the four stages S1, S2, S4 and S8, areconnected in parallel through resistors r1 through r4. The values ofeach of these resistors rare selected so that 16' sepa-rate equallyspaced voltage levels VD may 4be obtained at junction point 29 dependingon the state of the four stages S rof register R. It will thus be'seenthat the voltage. VDn at point 29 is an analog voltage representation ofthe corresponding descriptor Dn `and the portion ofthe system describedso far merely functions to convert a digital representation of adescriptor Dn to an analog voltage VDD.

The system further comprises a. plurality of weighting units W, W0'through W99, each of which is similar, and hence only one is described.The weighting unit W0 comprises an input terminal 31, an output terminal32 and a variable resisto-r unit 33 connected therebetween, having 16separate positions designated 0 through l5 each of which may be selectedindividually.

Since the input voltage VDI, to weighting unit Wn may vary also from 0to l5 voltage levels, the output voltage VDWn of the weighting unitr Wnmay obtain any one of 225 different equally spaced voltage values orlevels.

Theroutput terminal 32 of eachsof the weighting units is connected inparallel to one input terminal39 ofthe differential amplifier 40. Thevoltages VDWS applied tov input terminal 39 of the differentialamplifier 40 represent y the sum of the weighted analog voltages VDWOthrough VDWQQ and corresponds to the numerator of Equation l referred toearlier.

The other terminal 41 of the differential amplifier 40 is supplied witha weighted control voltage Vwo .proportional to the denominator ofEquation l. This weighted control voltage may be generated' inany'suitable manner.

.For example, if the denominator of the equation is written it will be`seen that it is possible to predetermine the `factor undervthe firstradical for each unit record and Vator is recorded in the first eightbit spaces bo through bq,

l of channel C6, The remaining bit spaces of channel- C6 may bevemployed for recording other information. Such;

aceaooe as the identity of the particular unit record as explained laterin the specification. The means 43 for converting the digitalrepresentation of the factor \/Do2iD12lD22l-De2 to an analog controlvoltage Vc is similar to the means 15 employed in converting the digitalrepresentations of the descriptors D to analog voltages VD. As shown,means 43 comprises transducer T6 connected to an eightstage register R77through AND gates All) through A17. The AND gates are opened insuccession by the lirst eight pulses C through C07 supplied by counter19. The recorded digital representation is transferred to the eightstages of the register R77. Each output stage of the register R77 isconnected in parallel through suitable weighting resistors r1t) throughr17. The voltage VC is therefore proportional to the tirst part of thedenominator.

The second portion of the denominator, namely,

may also be predetermined since each of the values W0 through W90 isknown prior to a comparison operation, Wn being the amplitude from 0 tol5 of the descriptor Dn corresponding to a preassigned wavelength in thespectrum. of the unknown quantity. A weighted voltage VWO proportionalto the denominator of Equation l may therefore be obtained by weightingthe output voltage VC supplied by means 43 with a conductance factorproportional to \/W02+W12l-W22 l/l/n2 A potentiometer d4 connectedbetween the output terminal of the converting means 43 and the inputterminal 41 of the differential amplier 4G functions to perform thisweighting of voltage VC by having 150 equal steps corresponding to 150different values of conductance.

It can be shown mathematically that Equation l is a maximum value ofunity when there is a direct match of Dn and Wn for each of theparameters. Stated somewhat differently, if each of the weighting unitsW is set at some arbitrarily chosen conductance setting andpotentiometer 44 is set at its appropriate setting corresponding to theoutput voltage of the differential amplifier 40 will be a maximum when adirect match exists between each of the analog voltage levels VDn andthe corresponding weight Wn. ln addition, any known record whosepararneters correspond to the respective parameters of the unknown inthe same proportion will also cause the difterential amplifier to supplya maximum voltage since the input voltages are equal.

As mentioned previously, one terminal 39 of the differential amplier issupplied with a voltage VDWS which is proportional to the numerator ofEquation '1, while the other terminal 41 is supplied with a weightedcontrol voltage VW@ proportional to the denominator of Equation 1. Thedifferential amplier as shown schematically in FIG. 5 has a pair ofoutput terminals 50` and 51. Terminal 50 is at a maximum value when theinput signals are equal and terminal 51 is at a minimum value when thesignals are equal. Terminal 5G is connected to a suitable level sensingunit 53, such as a Schmitt trigger, through a gating unit 55. Unit 53generates an indicating signal IS under control of a C0100 pulse fromcounter 19 when the voltage of terminal 50 exceeds a predeterminedlevel.

The indicating signal IS generated by the level sensing unit 53 may beemployed to control the entry of information concerning the identity ofthe record being compared to a print unit 6d. information as to theidentity of the record is stored in channel C6 of the tape 11 in bitspaces Z120 through b40 (not shown). Transducer T6, adapted to scanchannel C6 of the tape, is connected to the identity storage unit 61through a gate unit 62 which is opened lsb in response to a signal C020from the counter and closed by a signal C040 from the counter. Theidentity of each record scanned is therefore entered into the identitystorage unit 61. The identity storage unit 61 is connected to a printunit 60 through a gate unit 64 which is operated under control of theindicating signal 1S. The identity ot a scanned record which causes anindicating signal IS is therefore printed out by the print unit 6l).

ln order to search for records which have a predetermined degree ocorrespondence to the unknown, a percentage matched potentiometer do isinserted between the input ternn'nal 41 of the dilerential amplifier 40and the output terminal of the weighting potentiometer a4. Potentiometer66 serves the function of the factor K in Equation l by furtherweighting the voltage VWG from potentiometer For example, ifpotentiometer 66 is set at an percent position, each voltage JWCsupplied to terminal 41 of the dilerential amplifier 4t) is decreased bya factor of 20 percent. As a result any record which previously provideda voltage VDWS which was 80% or more of the voltage obtained at VWG whenthere is a l-l correspondu ence would now cause the direreutialamplifier 4S to opcrate the level sensing unit S3 since the inputvoltage at 39 is equal to or greater than the voltage at input 4i.

Assuming a library of records of known elements has Ibeen establishedand is recorded on tape l1, each record comprising l0() binary codeddigital representations corresponding to the descriptors D employed todeiine the element, together with other control data on channels 5 and6, and assuming further a spectrum of an unknown element has beenobtained, the operation of the system is substantially as follows. Theunknown spectrum is first delined in terms or" the parameters on whichthe records in the library are based. The weighting units W0 through W00are then set to the appropriate setting from 0 to 15 corresponding,respectively, to the values of the 10G parameters of the unknownelement. The factor \/W02}/12}W22+ W02 is calculated and potentiometer44 set at the appropriate position from O to l5() corresponding to thisfactor. The percentage match means 6o is adjusted to the desired degreeof correspondence which will be assumed to be 10G percent in the presentexample.

As the tape 11 moves past the read station 13, each binary coded digitalrepresentation of the l0() descriptors D0 through D00 is converted to10G corresponding analog voltage levels VDO through VDQQ by thedigital-toanalog converting means 15, transducer CL suppling clockpulses to counter 19 which supplies C0 pulses to gating units G0 throughG90 in succession. Each voltage VDn is then weighted by the associatedweighting unit Wn to provide the weighted voltage VDWn. rlhe sum of thel0() weighted voltages VDW0 through fm-V00 is supplied to terminal 39 ofthe differential amplifier 46 as voltage VDWS. As mentioned previously,this voltage corresponds to the numerator of Equation l.

Simultaneosuly as the descriptors D1 through D00 are being converted toweighted analog voltages, transducer T6 supplies binary representationsof the factor (recorded on channel 6 from bit each record) to convertermeans 43 which generates the analog control voltage VC. Voltage VC issupplied to potentiometer 44 which weights this voltage by a factorproportional to spaces b0 through b7 for VWoz-i-W12-l- W112i Wn2 andhence develops voltage VWG proportional to the denominator of Equationl. Likewise, transducer T6 supplies the identity or" each record scanned(as recorded on channel 6 in bit spaces b20 through Z740 of the record)to the identity storage unit ai through the gate unit 62 under thecontrol of counter pulses C020 and C040.

The output voltage of the differential amplifier 40 is in directlyproportional to the difference between voltage VDWS and VWG and is amaximum when the voltages are equal which occurs, for example, when the10G parameters of the spectrum of -the unknown, as represented byweights W through W99, are in correspondence with the respective 100parameters of a known element in the library or, in other words, wherethe spectra of the two elements have the same wave shape.

The output voltage of the differential amplifier 40 is sampledimmediately after each record is scanned by means of a C0100 pulse fromthe counter 19 via the gate unit 55. Assuming the percentage matchpotentiometer 66 was adjusted for 100 percent correspondence, theindicating signal lS is generated when the two input signals VDWS andVWG are equal and the identity of the record, as stored in the identitystorage unit 61, is caused to be printed by the print unit d. The C0100pulse also resets all the registers at the end of each record.

if the percentage match potentiometer 66'is set at a value less than 100percent, say Yd0 percent, the voltage VWG produced by ysome recordswhich are more than 80 percent close to the unknown will be smaller thanthe corresponding voltage VDWS. The maximum output voltage of thedifferential amplifier previously obtained for a setting of 100 percentwill of course be increased. However, the operating level of sensingdevice 53 remains constant. The system can readily operate at a 10,000k.c. rate. Assuming a library of 10,060 records, each having l0()parameters, a single vscan through the library would consume somethingless than two minutes.

An important feature of the present system arises from the fact that inaddition to a particular setting of the 100 weighting units providing amaximum voltage for a direct match, other records will provide voltageswhich are predetermined percentages of the maximum voltage in accordancewith how closely they resemble the unknown element. It is thereforepossible to adjust the setting of percentage match potentiometer so thatthe identity of all records which provide a voltage which is, forexample 90 percent of this maximum voltage will therefore be printed outby 'print unit 69 during a single scan through the library.

In the illustrated embodiment many of the circuit components which perse are old in the art have been shown in block diagram in that theirspecific details form no part of the present invention. Hence, thesecomponents are described and explained merely in terms of their functionsince reference may be had to many standard texts for operationaldetails of specific circuits capable of perform- -ing the recitedfunction.

While there have been shown and described and pointed out thefundamental novel features of the invention as applied to the preferredembodiment, it will be understood that various omissions and'substitutions and changes 'in' the form and details of the deviceillustratedy and in itsoperation may be made by those skilled in theart, without departing from the spirit of the invention. It is theintention, therefore, to be limited only as indicated by the scope ofthe following claims.

What is claimed is:

l. A system for comparing an unknown item of information defined by aplurality of parameters with a library i of known items defined bysimilar parameters to determine thoseitems which have a predetermineddegree of similarity to the unknown item comprising in combination,means for storing binary coded digital rpresenta tions of the numericalamplitude of said parameters of`V said known items, means for convertingeach said stored digital representation to a corresponding analogvoltage level, a plurality of units coupled to said converting means forweighting each' said -analog voltage level `by aV predetermined weightselected in accordance` with'the corresponding parameter offsaid'unknownitem Vto provide a plurality of weighted voltages,y means connecting theoutput of said weighting units in 'parallel to sum said weightedvoltages, means for generating a control voltage for each stored itemproportional to the square root ofthe sum of the squares of saidnumerical amplitude of said parameters defining said item, meansconnected to said generating means for weighting said control voltage bya conductance factor proportional to thev square Vroot of the sum-of thesquares of said predetermined weights to provide a weighted controlvoltage, a voltage comparing means having a pair of input terminals andan output terminal for providing an indicating signal only when thevoltage applied to one terminal is equal to or greater than the voltageapplied to the other terminal, means for supplying said summed voltageto one of said terminals, and

correspondence desired'between said unknown item and said known items.

3. The invention recited in claim l in which said storing meanscomprises a magnetic tape. Y

4. The invention recited in claim 1in which said converting meanscomprises a plurality of magnetic trans-` ducers, a plurality ofdigitalto analog converters and an electronic distributorfor connectingysaid transducers to each of said converters in succession.

5. lhe invention recitedin claim l1 in which each said weighting unitcomprises a potentiometer having a plurality of taps correspondingvtopredetermined conductance settings and a contact movable to a.selected one of said taps.

6. The invention recited in claim 1 in which said control voltagegenerating means comprises means for storing a binary coded digitalrepresentation of said control voltage, a digital to analog converter,and transducer means responsive to said control voltage representationsfor supplying signals to said converter.)

7. The invention recited in claim 1 further comprising means fortemporarily storing the identity of each known item being comparedyandmeans under control of said indicating signal for providing aprintedrecord of each item which hassaid predetermined degree of corre#spondence. Y

f References Cited in the le of this patent UNITED STATES PATENTS2,671,608 Hirsch Mar. 9, 1954 2,798,216 Goldberg et al July 2, 19572,799,222 Goldberg et al July 16, 1957 OTHER REFERENCES Klein et al.:Instruments and Automation, April 1956, pp. 695-697.

(Copy inDiv. 23.)

